AITopics | question network

Learning State Representations from Random Deep Action-conditional Predictions

Neural Information Processing SystemsApr-27-2026, 00:28:22 GMT

Our main contribution in this work is an empirical finding that random General Value Functions (GVFs), i.e., deep action-conditional predictions--random both in what feature of observations they predict as well as in the sequence of actions the predictions are conditioned upon--form good auxiliary tasks for reinforcement learning (RL) problems. In particular, we show that random deep action-conditional predictions when used as auxiliary tasks yield state representations that produce control performance competitive with state-of-the-art hand-crafted auxiliary tasks like value prediction, pixel control, and CURL in both Atari and DeepMind Lab tasks. In another set of experiments we stop the gradients from the RL part of the network to the state representation learning part of the network and show, perhaps surprisingly, that the auxiliary tasks alone are sufficient to learn state representations good enough to outperform an end-to-end trained actor-critic baseline.

machine learning, natural language, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Discovery of Useful Questions as Auxiliary Tasks

Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Janarthanan Rajendran, Richard L. Lewis, Junhyuk Oh, Hado P. van Hasselt, David Silver, Satinder Singh

Neural Information Processing SystemsFeb-11-2026, 12:15:33 GMT

Neural Information Processing Systems http://nips.cc/

agent, auxiliary task, representation, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > Canada > Alberta (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(3 more...)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment > Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

c71df24045cfddab4a963d3ac9bdc9a3-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 02:52:49 GMT

prediction, representation, state representation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(4 more...)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

Discovery of Useful Questions as Auxiliary Tasks

Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Janarthanan Rajendran, Richard L. Lewis, Junhyuk Oh, Hado P. van Hasselt, David Silver, Satinder Singh

Neural Information Processing SystemsOct-2-2025, 03:10:28 GMT

Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions. We present a novel method for a reinforcement learning (RL) agent to discover questions formulated as general value functions or GVFs, a fairly rich form of knowledge representation. Specifically, our method uses non-myopic meta-gradients to learn GVF-questions such that learning answers to them, as an auxiliary task, induces useful representations for the main task faced by the RL agent. We demonstrate that auxiliary tasks based on the discovered GVFs are sufficient, on their own, to build representations that support main task learning, and that they do so better than popular hand-designed auxiliary tasks from the literature. Furthermore, we show, in the context of Atari 2600 videogames, how such auxiliary tasks, meta-learned alongside the main task, can improve the data efficiency of an actor-critic agent.

auxiliary task, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.46)
North America > United States (0.28)

Genre: Research Report (0.88)

Industry: Leisure & Entertainment > Games > Computer Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Temporal Abstraction in Temporal-difference Networks

Neural Information Processing SystemsApr-6-2023, 15:16:48 GMT

The primary distinguishing feature of temporal-difference (TD) networks (Sutton & Tanner, 2005) is that they permit a general compositional specification of the goals of learning. The goals of learning are thought of as predictive questions being asked by the agent in the learning problem, such as "What will I see if I step forward and look right?" or "If I open the fridge, will I see a bottle of beer?" Seeing a bottle of beer is of course a complicated perceptual act. It might be thought of as obtaining a set of predictions about what would happen if certain reaching and grasping actions were taken, about what would happen if the bottle were opened and turned upside down, and of what the bottle would look like if viewed from various angles. To predict seeing a bottle of beer is thus to make a prediction about a set of other predictions. The target for the overall prediction is a composition in the mathematical sense of the first prediction with each of the other predictions. TD networks are the first framework for representing the goals of predictive learning in a compositional, machine-accessible form. Each node of a TD network represents an individual question--something to be predicted--and has associated with it a value representing an answer to the question--a prediction of that something. The questions are represented by a set of directed links between nodes.

node, prediction, td network, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Learning State Representations from Random Deep Action-conditional Predictions

Zheng, Zeyu, Veeriah, Vivek, Vuorio, Risto, Lewis, Richard, Singh, Satinder

arXiv.org Artificial IntelligenceFeb-9-2021

In this work, we study auxiliary prediction tasks defined by temporal-difference networks (TD networks); these networks are a language for expressing a rich space of general value function (GVF) prediction targets that may be learned efficiently with TD. Through analysis in an illustrative domain we show the benefits to learning state representations of exploiting the full richness of TD networks, including both action-conditional predictions and temporally deep predictions. Our main (and perhaps surprising) result is that deep action-conditional TD networks with random structures that create random prediction-questions about random features yield state representations that are competitive with state-of-the-art hand-crafted value prediction and pixel control auxiliary tasks in both Atari games and DeepMind Lab tasks. We also show through stop-gradient experiments that learning the state representations solely via these unsupervised random TD network prediction tasks yield agents that outperform the end-to-end-trained actor-critic baseline.

machine learning, natural language, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2102.04897

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(11 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Discovery of Useful Questions as Auxiliary Tasks

Veeriah, Vivek, Hessel, Matteo, Xu, Zhongwen, Lewis, Richard, Rajendran, Janarthanan, Oh, Junhyuk, van Hasselt, Hado, Silver, David, Singh, Satinder

arXiv.org Artificial IntelligenceSep-10-2019

Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions. We present a novel method for a reinforcement learning (RL) agent to discover questions formulated as general value functions or GVFs, a fairly rich form of knowledge representation. Specifically, our method uses non-myopic meta-gradients to learn GVF-questions such that learning answers to them, as an auxiliary task, induces useful representations for the main task faced by the RL agent. We demonstrate that auxiliary tasks based on the discovered GVFs are sufficient, on their own, to build representations that support main task learning, and that they do so better than popular hand-designed auxiliary tasks from the literature. Furthermore, we show, in the context of Atari 2600 videogames, how such auxiliary tasks, meta-learned alongside the main task, can improve the data efficiency of an actor-critic agent.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1909.04607

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Temporal-Difference Networks for Dynamical Systems with Continuous Observations and Actions

Vigorito, Christopher M.

arXiv.org Machine LearningMay-9-2012

Temporal-difference (TD) networks are a class of predictive state representations that use well-established TD methods to learn models of partially observable dynamical systems. Previous research with TD networks has dealt only with dynamical systems with finite sets of observations and actions. We present an algorithm for learning TD network representations of dynamical systems with continuous observations and actions. Our results show that the algorithm is capable of learning accurate and robust models of several noisy continuous dynamical systems. The algorithm presented here is the first fully incremental method for learning a predictive representation of a continuous dynamical system.

machine learning, prediction, reinforcement learning, (21 more...)

arXiv.org Machine Learning

1205.2608

Country: North America > United States > Massachusetts (0.46)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Temporal Abstraction in Temporal-difference Networks

Rafols, Eddie, Koop, Anna, Sutton, Richard S.

Neural Information Processing SystemsDec-31-2006

We present a generalization of temporal-difference networks to include temporally abstract options on the links of the question network. Temporal-difference (TD) networks have been proposed as a way of representing and learning a wide variety of predictions about the interaction between an agent and its environment. These predictions are compositional in that their targets are defined in terms of other predictions, and subjunctive in that that they are about what would happen if an action or sequence of actions were taken. In conventional TD networks, the interrelated predictions are at successive time steps and contingent on a single action; here we generalize them to accommodate extended time intervals and contingency on whole ways of behaving. Our generalization is based on the options framework for temporal abstraction. The primary contribution of this paper is to introduce a new algorithm for intra-option learning in TD networks with function approximation and eligibility traces.

agent, prediction, td network, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Temporal Abstraction in Temporal-difference Networks

Rafols, Eddie, Koop, Anna, Sutton, Richard S.

Neural Information Processing SystemsDec-31-2006

We present a generalization of temporal-difference networks to include temporally abstract options on the links of the question network. Temporal-difference (TD) networks have been proposed as a way of representing and learning a wide variety of predictions about the interaction between an agent and its environment. These predictions are compositional in that their targets are defined in terms of other predictions, and subjunctive in that that they are about what would happen if an action or sequence of actions were taken. In conventional TD networks, the interrelated predictions are at successive time steps and contingent on a single action; here we generalize them to accommodate extended time intervals and contingency on whole ways of behaving. Our generalization is based on the options framework for temporal abstraction. The primary contribution of this paper is to introduce a new algorithm for intra-option learning in TD networks with function approximation and eligibility traces.

agent, prediction, td network, (17 more...)

Neural Information Processing Systems

Country: